Adding timeout for relay connection#681
Conversation
WalkthroughAdded a configurable Changes
Sequence Diagram(s)sequenceDiagram
participant App as Application
participant Config as Configuration
participant Util as Connection Utility
participant Client as Nostr Client
App->>Config: Load settings (relay_connection_timeout_secs)
Config-->>App: Return Settings (timeout = 5s)
App->>Util: connect_nostr(nostr_settings)
Util->>Util: derive timeout from settings
Util->>Client: client.connect()
Client-->>Util: connection initiated
Util->>Client: client.wait_for_connection(timeout)
Client-->>Util: connected relays count
alt >=1 relay connected
Util-->>App: Ok(client) (ready)
else 0 relays connected
Util-->>App: Err(No relays connected)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/util.rs`:
- Around line 782-785: After calling client.connect().await and
client.wait_for_connection(timeout).await in connect_nostr, verify that at least
one relay actually connected and return an Err when none did; specifically,
after wait_for_connection(timeout) check the client's relay connection
count/status (e.g., using the client instance methods that expose connected
relays or relay statuses) and if zero connected relays, log/return an error
instead of returning Ok(client), otherwise continue returning Ok(client).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 13c64bfc-80fe-44ef-92bb-3ee0b48aa2f3
📒 Files selected for processing (4)
settings.tpl.tomlsrc/app/context.rssrc/config/types.rssrc/util.rs
00b7977 to
cdebde0
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/util.rs (1)
794-799: Consider startup resilience for future enhancement.Currently, if all configured relays are temporarily unreachable at startup (e.g., transient network issue), Mostro will fail to start. This is consistent with the database connection behavior (both use fail-fast patterns).
For future consideration: adding retry with exponential backoff before failing could improve resilience against brief network disruptions during deployment/restart. However, the current behavior is reasonable for the PR's scope of adding timeout handling.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/util.rs` around lines 794 - 799, The code returns an error immediately when connected == 0 after waiting nostr_settings.relay_connection_timeout_secs, which makes startup fail-fast; instead implement a retry loop with exponential backoff before returning Err: wrap the existing check around a limited number of retries (configurable via a new setting like nostr_settings.relay_connection_retry_attempts) and on each failure wait an exponentially increasing delay (base + jitter) before re-checking connected; keep the existing error path as a final fallback and log each retry attempt including the current attempt number, delay, and nostr_settings.relay_connection_timeout_secs so callers can trace retries.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/util.rs`:
- Around line 794-799: The code returns an error immediately when connected == 0
after waiting nostr_settings.relay_connection_timeout_secs, which makes startup
fail-fast; instead implement a retry loop with exponential backoff before
returning Err: wrap the existing check around a limited number of retries
(configurable via a new setting like
nostr_settings.relay_connection_retry_attempts) and on each failure wait an
exponentially increasing delay (base + jitter) before re-checking connected;
keep the existing error path as a final fallback and log each retry attempt
including the current attempt number, delay, and
nostr_settings.relay_connection_timeout_secs so callers can trace retries.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 373a80cc-c58f-4b2f-a2d5-8bb022091908
📒 Files selected for processing (4)
settings.tpl.tomlsrc/app/context.rssrc/config/types.rssrc/util.rs
✅ Files skipped from review due to trivial changes (1)
- settings.tpl.toml
🚧 Files skipped from review as they are similar to previous changes (1)
- src/app/context.rs
Catrya
left a comment
There was a problem hiding this comment.
Thanks for the contribution @sergey3bv
However, this PR doesn't address the issue it references (mobile #548). That issue is about the Mostro Mobile app hanging when a relay is down, it's an app side problem caused by dart_nostr connecting to relays sequentially with no effective connection timeout.
The mostro daemon uses nostr-sdk, which handles this correctly: client.connect().await is non-blocking, it spawns independent background tasks per relay in parallel with per relay timeouts, so a dead relay never blocks the others.
If all relays are down at startup, operations like send_event simply fail individually while nostr-sdk keeps retrying with adaptive backoff, and reconnects automatically when relays come back up. Currently, when a relay is down, mostrod continues working normally, you can see the failing relay in the logs but it doesn't affect operations.
So the scenario this PR guards against is already handled gracefully. I'd suggest we close this one to keep things simple and avoid adding a config field and logic for something already covered, but as you consider @grunch @arkanoider
Indeed, it was not there when I started working, so the context was different |
@sergey3bv thanks for your effort, but I confirm as @Catrya pointed out that rust nostr manages the connection to relay in a smooth way, every connection is spawned in a separated thread/task. |
arkanoider
left a comment
There was a problem hiding this comment.
Thanks for the contribution @sergey3bv However, this PR doesn't address the issue it references (mobile #548). That issue is about the Mostro Mobile app hanging when a relay is down, it's an app side problem caused by
dart_nostrconnecting to relays sequentially with no effective connection timeout.The mostro daemon uses nostr-sdk, which handles this correctly:
client.connect().awaitis non-blocking, it spawns independent background tasks per relay in parallel with per relay timeouts, so a dead relay never blocks the others. If all relays are down at startup, operations likesend_eventsimply fail individually while nostr-sdk keeps retrying with adaptive backoff, and reconnects automatically when relays come back up. Currently, when a relay is down, mostrod continues working normally, you can see the failing relay in the logs but it doesn't affect operations.So the scenario this PR guards against is already handled gracefully. I'd suggest we close this one to keep things simple and avoid adding a config field and logic for something already covered, but as you consider @grunch @arkanoider
@sergey3bv thanks for your effort, but I confirm as @Catrya pointed out that rust nostr manages the connection to relay in a smooth way, every connection is spawned in a separated thread/task.
|
closing this PR |
Should solve MostroP2P/mobile#548
Summary by CodeRabbit